20 research outputs found

    Towards a More Rigorous Science of Blindspot Discovery in Image Models

    Full text link
    A growing body of work studies Blindspot Discovery Methods ("BDM"s): methods that use an image embedding to find semantically meaningful (i.e., united by a human-understandable concept) subsets of the data where an image classifier performs significantly worse. Motivated by observed gaps in prior work, we introduce a new framework for evaluating BDMs, SpotCheck, that uses synthetic image datasets to train models with known blindspots and a new BDM, PlaneSpot, that uses a 2D image representation. We use SpotCheck to run controlled experiments that identify factors that influence BDM performance (e.g., the number of blindspots in a model, or features used to define the blindspot) and show that PlaneSpot is competitive with and in many cases outperforms existing BDMs. Importantly, we validate these findings by designing additional experiments that use real image data from MS-COCO, a large image benchmark dataset. Our findings suggest several promising directions for future work on BDM design and evaluation. Overall, we hope that the methodology and analyses presented in this work will help facilitate a more rigorous science of blindspot discovery

    OpenXAI: Towards a Transparent Evaluation of Model Explanations

    Full text link
    While several types of post hoc explanation methods (e.g., feature attribution methods) have been proposed in recent literature, there is little to no work on systematically benchmarking these methods in an efficient and transparent manner. Here, we introduce OpenXAI, a comprehensive and extensible open source framework for evaluating and benchmarking post hoc explanation methods. OpenXAI comprises of the following key components: (i) a flexible synthetic data generator and a collection of diverse real-world datasets, pre-trained models, and state-of-the-art feature attribution methods, (ii) open-source implementations of twenty-two quantitative metrics for evaluating faithfulness, stability (robustness), and fairness of explanation methods, and (iii) the first ever public XAI leaderboards to benchmark explanations. OpenXAI is easily extensible, as users can readily evaluate custom explanation methods and incorporate them into our leaderboards. Overall, OpenXAI provides an automated end-to-end pipeline that not only simplifies and standardizes the evaluation of post hoc explanation methods, but also promotes transparency and reproducibility in benchmarking these methods. OpenXAI datasets and data loaders, implementations of state-of-the-art explanation methods and evaluation metrics, as well as leaderboards are publicly available at https://open-xai.github.io/.Comment: 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmark

    Effects of fluoxetine on functional outcomes after acute stroke (FOCUS): a pragmatic, double-blind, randomised, controlled trial

    Get PDF
    Background Results of small trials indicate that fluoxetine might improve functional outcomes after stroke. The FOCUS trial aimed to provide a precise estimate of these effects. Methods FOCUS was a pragmatic, multicentre, parallel group, double-blind, randomised, placebo-controlled trial done at 103 hospitals in the UK. Patients were eligible if they were aged 18 years or older, had a clinical stroke diagnosis, were enrolled and randomly assigned between 2 days and 15 days after onset, and had focal neurological deficits. Patients were randomly allocated fluoxetine 20 mg or matching placebo orally once daily for 6 months via a web-based system by use of a minimisation algorithm. The primary outcome was functional status, measured with the modified Rankin Scale (mRS), at 6 months. Patients, carers, health-care staff, and the trial team were masked to treatment allocation. Functional status was assessed at 6 months and 12 months after randomisation. Patients were analysed according to their treatment allocation. This trial is registered with the ISRCTN registry, number ISRCTN83290762. Findings Between Sept 10, 2012, and March 31, 2017, 3127 patients were recruited. 1564 patients were allocated fluoxetine and 1563 allocated placebo. mRS data at 6 months were available for 1553 (99·3%) patients in each treatment group. The distribution across mRS categories at 6 months was similar in the fluoxetine and placebo groups (common odds ratio adjusted for minimisation variables 0·951 [95% CI 0·839–1·079]; p=0·439). Patients allocated fluoxetine were less likely than those allocated placebo to develop new depression by 6 months (210 [13·43%] patients vs 269 [17·21%]; difference 3·78% [95% CI 1·26–6·30]; p=0·0033), but they had more bone fractures (45 [2·88%] vs 23 [1·47%]; difference 1·41% [95% CI 0·38–2·43]; p=0·0070). There were no significant differences in any other event at 6 or 12 months. Interpretation Fluoxetine 20 mg given daily for 6 months after acute stroke does not seem to improve functional outcomes. Although the treatment reduced the occurrence of depression, it increased the frequency of bone fractures. These results do not support the routine use of fluoxetine either for the prevention of post-stroke depression or to promote recovery of function. Funding UK Stroke Association and NIHR Health Technology Assessment Programme

    The evolving SARS-CoV-2 epidemic in Africa: Insights from rapidly expanding genomic surveillance

    Get PDF
    INTRODUCTION Investment in Africa over the past year with regard to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequencing has led to a massive increase in the number of sequences, which, to date, exceeds 100,000 sequences generated to track the pandemic on the continent. These sequences have profoundly affected how public health officials in Africa have navigated the COVID-19 pandemic. RATIONALE We demonstrate how the first 100,000 SARS-CoV-2 sequences from Africa have helped monitor the epidemic on the continent, how genomic surveillance expanded over the course of the pandemic, and how we adapted our sequencing methods to deal with an evolving virus. Finally, we also examine how viral lineages have spread across the continent in a phylogeographic framework to gain insights into the underlying temporal and spatial transmission dynamics for several variants of concern (VOCs). RESULTS Our results indicate that the number of countries in Africa that can sequence the virus within their own borders is growing and that this is coupled with a shorter turnaround time from the time of sampling to sequence submission. Ongoing evolution necessitated the continual updating of primer sets, and, as a result, eight primer sets were designed in tandem with viral evolution and used to ensure effective sequencing of the virus. The pandemic unfolded through multiple waves of infection that were each driven by distinct genetic lineages, with B.1-like ancestral strains associated with the first pandemic wave of infections in 2020. Successive waves on the continent were fueled by different VOCs, with Alpha and Beta cocirculating in distinct spatial patterns during the second wave and Delta and Omicron affecting the whole continent during the third and fourth waves, respectively. Phylogeographic reconstruction points toward distinct differences in viral importation and exportation patterns associated with the Alpha, Beta, Delta, and Omicron variants and subvariants, when considering both Africa versus the rest of the world and viral dissemination within the continent. Our epidemiological and phylogenetic inferences therefore underscore the heterogeneous nature of the pandemic on the continent and highlight key insights and challenges, for instance, recognizing the limitations of low testing proportions. We also highlight the early warning capacity that genomic surveillance in Africa has had for the rest of the world with the detection of new lineages and variants, the most recent being the characterization of various Omicron subvariants. CONCLUSION Sustained investment for diagnostics and genomic surveillance in Africa is needed as the virus continues to evolve. This is important not only to help combat SARS-CoV-2 on the continent but also because it can be used as a platform to help address the many emerging and reemerging infectious disease threats in Africa. In particular, capacity building for local sequencing within countries or within the continent should be prioritized because this is generally associated with shorter turnaround times, providing the most benefit to local public health authorities tasked with pandemic response and mitigation and allowing for the fastest reaction to localized outbreaks. These investments are crucial for pandemic preparedness and response and will serve the health of the continent well into the 21st century

    Where Does My Model Underperform? A Human Evaluation of Slice Discovery Algorithms

    No full text
    Machine learning (ML) models that achieve high average accuracy can still underperform on semantically coherent subsets ("slices") of data. This behavior can have significant societal consequences for the safety or bias of the model in deployment, but identifying these underperforming slices can be difficult in practice, especially in domains where practitioners lack access to group annotations to define coherent subsets of their data. Motivated by these challenges, ML researchers have developed new slice discovery algorithms that aim to group together coherent and high-error subsets of data. However, there has been little evaluation focused on whether these tools help humans form correct hypotheses about where (for which groups) their model underperforms. We conduct a controlled user study (N = 15) where we show 40 slices output by two state-of-the-art slice discovery algorithms to users, and ask them to form hypotheses about an object detection model. Our results provide positive evidence that these tools provide some benefit over a naive baseline, and also shed light on challenges faced by users during the hypothesis formation step. We conclude by discussing design opportunities for ML and HCI researchers. Our findings point to the importance of centering users when creating and evaluating new tools for slice discovery
    corecore